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A Sampling Strategy for High-Dimensional Spaces Applied 
to Free-Form Gravitational Lensing 
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ABSTRACT 

We present a novel proposal strategy for the Metropolis-Hastings algorithm designed to 
efficiently sample general convex poly topes in 100 or more dimensions. This improves 
upon previous sampling strategies used for free-form reconstruction of gravitational 
lenses, but is general enough to be applied to other fields. We have written a parallel 
implementation within the lens modeling framework GLASS. Testing shows that we 
are able to produce uniform uncorrelated random samples which are necessary for 
exploring the degeneracies inherent in lens reconstruction. 
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1 INTRODUCTION 

Some inversion problems in astrophysics make it desirable to 
search or sample a high dimensional solution domain S C M n 
such that S is bounded by the linear constraints 



Ax < b 



(1) 



where A £ R mXn and b is a constant vector. A classic ap- 
plication is Schwarzschild's construction of triaxial stellar 
systems in equilibrium (Schwarzschild 1979). Given a three- 
dimensional discretized target density function pj , the num- 
ber of stars d on a given orbit i is found by solving 



Pj 



(2) 



where <Jij is the orbit density. The orbit density is calculated 
a priori using test particles in a fixed potential correspond- 
ing to p. However, searching the model space was not feasible 
at the time and only some particular models were consid- 
ered. More recent work has further developed this technique 
(Schwarzschild 1982; Merritt 1999; Cappellari et al. 2006, 
and references therein). 

In this paper we consider applications to gravitational 
lensing. Lensing has had quite a long history, beginning with 
the first direct evidence of general relativity, but until 1979 
with the discovery of the extra-solar lens Q0957+561 (Young 
et al. 1981) the field was largely of only theoretical inter- 
est (Refsdal 1964b, a). Today more than one hundred strong 
lensing objects are known with many studied in great detail 
(e.g., Kochanek et al. 1999; Faure et al. 2008; Auger et al. 
2009). Future surveys promise to deliver thousands more. 
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Of utmost interest is the mass distribution of the lens- 
ing object. Characterizing this distribution is important for 
understanding the properties of galaxies and clusters (Read 
et al. 2007; Sereno et al. 2010), galaxy formation and evo- 
lution (Tortora et al. 2010; Faure et al. 2011), the nature of 
dark matter (Clowe et al. 2006), as well as estimating cos- 
mological parameters (Bartelmann & Schneider 2001) and 
the age of the universe (Saha et al. 2006; Oguri 2007; Coles 
2008). 

Crucially, the equations governing gravitational lensing 
are linear in the projected mass density n. As detailed in 
Section 2, one can discretize n onto a grid of pixels and solve 
for physically motivated solutions by imposing constraints 
in the form of Eq. (1). Several versions of this idea have been 
developed by Saha & Williams (2004), Coe et al. (2008), and 
Koopmans (2005). 

This free-form approach is more flexible than simple an- 
alytic models, which assume a functional form of the mass 
profile and may unintentionally break degeneracies. How- 
ever, this creates a large system of linear equations that is 
highly underconstrained. To understand the range of degen- 
eracies we therefore require a technique that can explore the 
space of solutions S. One possible technique is to choose a 
random point x and accept it if x lies in S. This might be a 
reasonable method in low dimensions n, as is done in Monte 
Carlo integration, but the probability of acceptance rapidly 
approaches zero as n increases. Each of the pixels in the dis- 
cretization of n represents one dimension and typically n is 
greater than 100. Complex systems, where multiple lenses 
are used, can easily have more than 1000 dimensions. The 
priors can also be arbitrary, so the simplex will have a very 
complex shape, although by construction it will always be 
convex. 

General sampling of probability distributions has been 
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a topic of statistics research for many years (e.g., Chib & 
Greenberg 1995; Robert & Casella 2005). In the case of 
lensing, the PixeLens algorithm (Saha & Williams 2004) is 
frequently used. We show, however, that the sampling of 
this algorithm is not uncorrelated. We address these details 
and related issues in Section 3 and suggest an alternative 
based on the Metropolis-Hastings algorithm in Section 4. In 
Section 5 we discuss the implementation and demonstrate 
in Section 6 that even for high dimensions we are able to 
sample our solution space to achieve a uniform uncorrelated 
random sample. We also achieve significant speed improve- 
ments over PixeLens. In Section 7 we discuss future work 
and applications. 



such as the mass must be positive everywhere, variations 
in k must be smooth, and the local density gradient must 
point within 45° of the center. A complete discussion can be 
found in Coles (2008). Our choice of constraints constructs 
a non-empty compact solution space S, which is a convex 
polytope, or simplex. By our definition of the solution space, 
the Bayesian posterior distribution is 

. . / 1 if x e S , . 

P( - X)(X \ XxtS (8) 

since all x £ S are equally probable. We are interested in an 
uncorrelated uniform random sample drawn from S, which 
we will simply refer to as a random sample. 



2 FRAMEWORK 

There are two primary equations in gravitational lensing 
(Blandford & Narayan 1986; Schneider et al. 1992; Schneider 
2006). The lens equation 

13 = 6- VV(0) (3) 

maps an observed position to an unobservable source po- 
sition (3 through the potential 

^{Q) = - [ «(0')ln|0-0'|d0' (4) 

where k is the dimensionless projected mass density of the 
lensing object. The Fermat potential 

r(0) = i|0-/3| 2 -V(0) (5) 

measures, up to an affine transformation, the time a photon 
takes to travel from the source to the observer. Eq. (3) cor- 
responds to the stationary points of Eq. (5). If the source 
varies in brightness and the arrival times are different for dif- 
ferent images then one can measure the physical time delay 
At2i between the light curves of 6± and 02- 

As in Saha & Williams (2004), we discretize n into grid 
cells, or pixels, centered on the lensing object and construct 
a system of linear equations from At oc At and Eq. (3) : 

Cx = d (6) 

where C £ R pxfe , d is a constant vector, and x is a vector 
containing the free parameters k and (3. These equations 
only serve to reduce the dimension of the problem k by the 
number of equalities p since in general p ^ k, where p is equal 
to twice the number of observed images plus the number of 
measured time delays. This reduction is performed with the 
orthogonal projection 

proj(x) = (1 - C + C)x + C + d (7) 

which takes a point x to the solution set of Eq. (6). The 
matrix C + is the Moore-Penrose pseudoinverse of C. A basis 
of this affine space is given by those eigenvectors of (1 — 
C + C) with eigenvalue equal one. We can therefore, without 
loss of generality, take this reduced space of dimension n — 
k — p to be our problem domain. 

Since the space is unbounded, we must impose con- 
straints in the form of Eq. (1) to limit ourselves to rea- 
sonable and physical solutions. These constraints may de- 
rive from data, such as arrival time order or image parity, 
or from Bayesian priors. We consider only modest priors, 



3 REVISITING THE PIXELENS ALGORITHM 

Earlier work used the program PixeLens (Saha & Williams 
2004) to model gravitational lenses and estimate the Hubble 
Time Hq 1 . The sampling strategy employed in PixeLens is a 
type of random walk explained in detail in Saha & Williams 
(2004) and Coles (2008). Here we summarize the algorithm 
and discuss some problems. 

To build a set of sample points X — {xi, X2, • • • } one 
begins by selecting a set of vertices V = {vq, vi, . . . } of S. 
The first sample point x\ is chosen uniformly from the chord 
connecting vo,vi. Each new point x% with i > 1 is chosen 
randomly and uniformly from the chord from v% through 
Xi-i to the boundary of S. In the limit of infinite samples 
this algorithm will explore the entire simplex. 

To construct V, PixeLens uses the simplex algorithm 
(Dantzig 1963; Press et al. 2007). This algorithm was de- 
signed to maximize (or equivalently minimize) a linear ob- 
jective function f(x) subject to linear constraints as in 
Eq. (1) by moving from vertex to vertex of the simplex in a 
direction that always increases /. It is a standard algorithm 
in the field of linear programming, where the vertex that 
maximizes / is the desired result. Finding a particular vertex 
is not the goal of PixeLens and so each vertex v% is found by 
maximizing a random objective function f(x) = c • x where 
c = (ci, . . . , c n ) is a random vector with uniform a £ [—1,1]. 

One issue is that randomly choosing an objective func- 
tion does not randomly choose a vertex. If the simplex is 
not a regular polytope there will be some vertices that are 
chosen with a higher probability than others. This is demon- 
strated in Figure 1 with a simplex in seven dimensions with 
32 vertices. The vertices have been enumerated and sorted 
by the number of times they were chosen. Clearly some ver- 
tices are highly preferred. Even in high dimensions where 
the choice of a particular vertex is unlikely to occur again, 
vertices that are particularly acute will be more likely. By 
not choosing vertices at random the algorithm prefers some 
regions over others which leads to correlations in the final 
sample and not all directions will be adequately explored. 
Vertex selection is also not invariant under some general co- 
ordinate transformation x Tx for an invertible matrix 
T G K nXn . Even a change in units may affect the sample 
distribution and therefore the inferred physical parameters. 

Correctly choosing vertices at random is itself a diffi- 
cult problem. There exist several methods to enumerate the 
vertices (e.g., Avis & Fukuda 1992; Dyer 1983) but unfor- 
tunately the number of vertices has a huge combinatorial 
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Figure 1. In the PixeLens sampling some vertices are preferred to others when choosing a random objective function. This is shown 
here using a simplex in seven dimensions with 32 vertices. The red curve is the number of times N a particular vertex was chosen using 
PixeLens. For comparison, a purely random choice of the same vertices produces the black curve. 
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4 A NEW MCMC PROPOSAL DENSITY 



m + [^n\ \ I m + [|n] — 1 



+ 



2' 

m 



(9) 



where m is the number of inequalities (McMullen & Shep- 
hard 1971). 

To avoid the enumeration we modified the simplex algo- 
rithm to randomly walk between the vertices. This dispenses 
with the objective function and simply selects a neighbor- 
ing vertex to which to move. While this does improve the 
sampling of the vertices (see Figure 1), it is not without its 
own problems. If a vertex has many close neighbors, as is 
likely in high dimensions, the random walk will tend to stay 
in one region before moving large distances. To compensate, 
one must run for a long time. The process of moving to a 
new vertex is computationally costly, however, and incurs 
numerical error that quickly dominates after too many iter- 
ations. 

Another issue is that the PixeLens algorithm is based 
on the vertices of S. The algorithm produces samples that 
do not follow the target probability distribution function 
P(x), even if the vertices are chosen randomly. In Figure 2 
we demonstrate this for a 100 dimensional hypercube and n- 
ball, where the vertices have been randomly selected a priori 
to avoid the PixeLens vertex selection algorithm. In the case 
of the n-ball we have chosen a random set of points on the 
surface to be the vertices. Points chosen from the hypercube 
tend to lie in the corners, while points from the n-ball are 
more closely clustered in the center. In general, the points 
tend to clump along the chords connecting vertices. In both 
cases the PixeLens sampling is markedly different than a 
random sample, although the means are nearly identical. 

The sample should be uniformly distributed in S in or- 
der to be able to perform statistical analysis on it. For this 
reason, we chose to explore an alternative method based on 
a random walk that does not depend on the vertices of the 
simplex. 



The Metropolis-Hastings algorithm (Metropolis et al. 1953; 
Hastings 1970) is a well known method to sample the prob- 
ability distribution P{x) by generating a Markov chain 
X = {xi,X2, . . . }. A sample x' is selected from a proposal 
density function Q(x',Xi) given the current sample x% and 
if 

P{x>)Q^x>) 
^ P( Xi )Q(x',Xi) K } 

where ^ a ^ 1 is chosen from a uniform distribution, then 
x' is accepted and Xi+i = x' . If x' is rejected the current 
point is duplicated as x%-^-\ — x% . We will assume that Q is 
symmetric, i.e., Q(x\xi) = Q(xi,x), so that the chance of 
moving from x\ to Xi + \ is the same as moving from Xi+i to 

Xi. 

One possibility for Q is to simply move by an arbitrary 
amount in a random direction but the chain may become 
trapped in narrow regions, especially in high dimensions. 
To account for the shape of S, Q is often taken to be a mul- 
tivariate Gaussian distribution J\f(xi, X), where XI is the co- 
variance matrix of the sample X. This matrix is an estimate 
of the covariance matrix X of S and can be progressively 
calculated as the chain is built. Such adaptive chains are no 
longer Markovian because the reversibility is broken, but one 
can run an initial adaptive burn-in phase before beginning 
the Markovian chain with X fixed at its last value. 

Selecting x' from A/"(x^,X) is equivalent to selecting 
y from the distribution A/"(0, l) 1 and setting X — Xi -\~ 
EA 1 ^ 2 ?/', where E is the matrix of the eigenvectors e 3 - of X 
and A is the diagonal matrix with the corresponding eigen- 
values Xj. In other words, we move along a randomly se- 
lected direction accounting for the shape of S through the 
eigenvalues. 

For a reasonable estimate of X, particularly in high di- 
mensional spaces, it is important to have ^> n points. When 
only > n points are known, some of the eigenvalues of X are 
strongly underestimated simply due to poor sampling of the 

1 The probability density function for y' is f(y') = 
(27t) _ * e~^ y '^ . 
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Figure 2. A comparison of sample sets Xp and Xp for the PixeLens sampling algorithm (red) and random sampling (black), 
respectively. Each set has 100,000 items. The means of the plotted distributions (vertical lines) for both the hypercube and n-ball 
samples are nearly identical, whereas the deviations are not. (Top left) We take the sample space S to be a unit cube centered at the 
origin in 100 dimensions and show the probability distribution function of r = \x\ for all x £ X. (Top Right) The PDF for the single 
coordinate x 1 of each sample point. Marginalizing over the other coordinates we expect the probability to be uniformly one. Points 
from Xp clump along the chords connecting vertices. (Bottom left) Similarly, we plot the PDF where S is a n-ball. (Bottom right) 
Marginalizing out a single coordinate we see that a random sample has a broader distribution. 



space. Even if true random samples were to be drawn di- 
rectly from S, the shape would be incorrectly estimated. In 
Figure 3 all the eigenvalues of a 100 dimensional cube should 
be equal, but for small sample sizes \X\ this is clearly not 
the case. Poorly estimated eigenvalues cause the standard 
proposal density to undersample S in the direction of the 
corresponding eigenvectors, even as new points are added to 
the chain; the new points reinforce the bias that was present 
in the original X. 

The key improvement from this paper is to use the con- 
straint information of Eq. (1) to hint at the shape of S 
and achieve a reasonable proposal density despite having 
a small sample size. We do this in the following way. Let 
X = {xi, X2, • • • , Xk} with k ^ n + 1 be a set of points in 
S C K n . These points may be chosen in any fashion, but 
the sample covariance matrix 5] must be invertible, i.e., all 
Xi's do not lie on the same hyperplane of R n , and thus the 
set of the eigenvectors of is an orthonormal basis of R n . 
Since S is convex, the mean (X) will also be in S. Extend- 
ing the eigenvector ej from (X) intersects the boundary of 
S at two points: one in the positive and one in the negative 
direction. The distance dj between each pair of boundary 
points is taken as an estimate of the size of S along ej. 

Our modification takes Q to be the multivariate Gaus- 
sian distribution Af(xi, X), where XI = EDE T and D is the 
diagonal matrix of the new cr 2 = dj/ 12. We therefore select 
x — Xi + ED 1 / 2 ?/. The ellipsoidal shape of Q is thus ad- 



justed by substituting A with D to better approximate the 
shape of S. In Figure 4 we show schematically the modi- 
fication. The initial set of points inadequately samples the 
horizontal direction and therefore the eigenvalues Ai,A2 of 
the covariance matrix are small in that direction. The new 
distances d\ , d<2 are a much better approximation and are 
taken in place of the eigenvalues. This strategy encourages 
the movement along directions that have been poorly sam- 
pled and therefore have a very small variance. This is an 
important distinction to so-called Hit-and-Run algorithms 
(Belisle et al. 1993) that step in a random direction and may 
require many iterations to move through narrow spaces. 

Metropolis algorithms in high dimensions can often be 
inefficient. If the average step length is too big, almost all 
proposals will fall into low probability regions and be re- 
jected, whereas if the step length is too small, almost all 
proposals will be accepted but sampling the space will be 
very slow. The optimal is somewhere in between and can be 
reached if we regulate the step length by multiplying the a 's 
with some scaling constant c such that the acceptance rate is 
roughly 25% (Gelman et al. 1996). In our case P{x) is not a 
multivariate Gaussian distribution and its shape varies from 
case to case and thus there is no single c. However, we found 
that for c ~ 1/n the acceptance rate remains reasonable. 

Assuming c — 4/n, the random walk will typically 

translate to an average distance of t\ = 4 (^2j ^ /n 2 ^j 
after one step and t s = tiy/s/2 after s steps assuming an 
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Figure 3. The sorted eigenvalues of the sample covariance matrix of three sets of random points in a 100 dimensional cube with side 
length \f\2. The set size is \X\. With infinite samples we expect all eigenvalues to be equal to unity, but due to the high dimensionality, 
a random sample does not fully estimate each direction, especially for low \X\. 



acceptance rate of 25%. In order to produce two uncorre- 
cted points in S, we must make s — N s steps such that the 
average traveled distance is of the order of the simplex diam- 
eter D. In the case of the hyperrect angle or similar shaped 
simplices 



£>~2^X>, 2 = §ti=t„ a (11) 

and therefore N s — 0(n 2 ) steps are needed. In other cases, 
such as a regular n-simplex, where the approximation for 
D in Eq. (11) does not hold, N s > 0(n 2 ) steps may be 
required. This is not a typical scenario however in lensing, 
as we demonstrate in Section 6. 

The strength of this algorithm is that it is not sensitive 
to the dispersion of the starting set of points X but only 
to its mean (X). When (X) is not a good estimate of the 
center of S, the algorithm will need some time to remove 
the starting bias. 



5 IMPLEMENTATION 

We have implemented our modified sample strategy in a new 
gravitational lens modeling framework called GLASS. This 
framework is specifically designed for free-form lens model- 
ing and to allow for easy modification of modeling strategies 
and priors. Furthermore, we are able to immediately test the 
implementation by comparing with lensing theory and pub- 
lished results. 

As discussed in the previous section, the proposal den- 
sity Q depends on an estimate for the size of the simplex 
S. We estimate the size by measuring the distances dj from 
the current sample chain mean (X) to the boundary of S 
following the estimated eigenvectors ej . These diameters are 
best estimated if (X) is close to the simplex mean (S) and 
the vectors ej are aligned with the true eigenvectors of S. 

As is often done (Press et al. 2007), the Markov chain is 
restricted to move coordinate- wise along the eigenvectors ej , 
rather than in a random direction. We rotate S such that 
these eigenvectors coincide with the standard basis. This 
provides a significant performance improvement since only 



one coordinate needs to be updated. In addition, the con- 
straints Eq. (1), typically numbering a few times n in lens- 
ing, must only be checked in one coordinate. After N s steps 
the last point is rotated back into the original coordinate 
system and appended to X. With N s — 0(n 2 ) the running 
time to produce one sample is 0(n 3 ). While PixeLens also 
has a theoretical running time of 0(n 3 ) our implementa- 
tion has a reduced scaling constant resulting in significant 
performance gains. 

Our implementation begins by finding the point xo 
where the temporary variable t is maximized subject to 
Axo + t ^ b. For this we use the simplex algorithm but 
any linear programming algorithm will suffice. The point xo 
is inside S and in some sense "far" from the boundaries. 

Initially, the chain walks along the eigenvectors of the 
matrix 1 — C + C. The chain is run for a burn-in phase where 
we collect Nb samples. We typically let Nb = lOn. After the 
first 2n samples, and subsequently after each Nb/ 10 samples, 
we updated Q by calculating 5] and then continue walking 
along the respective eigenvectors. The scaling constant c is 
adjusted to hold the acceptance rate around 25%. 

After the burn-in phase, we fix 5] and c at their final 
values and run a new chain for as many samples as are de- 
sired. In both phases we can run several Markov chains in 
parallel as long as we ensure that all threads use the same 
X. We have tested this on a shared memory machine using 
up to 48 CPUs. 

As we move to higher dimensional spaces we expect that 
accumulation of round-off error will tend to produce depar- 
tures from the equality constraints in Eq. (6). To remove 
this numerical error we ensure that a sample point x lies 
on the simplex by using Eq. (7) to project it back onto the 
simplex. 



6 SAMPLING EVALUATION 

The stationary distribution of the sample set X from any 
general MCMC strategy will be the target probability dis- 
tribution P{x). This is only reached though for a sample 
size much greater than n. In practice this is not feasible 
in high dimensions and we must limit our sample size to 
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Figure 4. Our modification to the choice of proposal density Q is show schematically. The eigenvectors and corresponding eigenvalues 
Ai = /f,A2 = l\ of the covariance matrix of an initial set of points do not accurately capture the shape of the simple S. This is a 
problem only for high dimensions. We show a two dimensional drawing for simplicity. Since the boundaries of S are known, we extend 
the eigenvectors until we reach the boundary. We substitute the eigenvalues with d^/12. 



> n. As we demonstrated in Figure 3 the eigenvectors of a 
small random sample will not be able to fully describe the 
solution space but for lensing statistics this is sufficient as 
we typically marginalize over many parameters. As we want 
to obtain a uniform uncorrelated random sample, we com- 
pare our MCMC implementation to a random sample from 
a hyper rectangle and a regular n-simplex in 100 dimensions, 
while varying N s to measure convergence. 
The hyper rectangle 

H n = {(xi,X2,...,X n ) G | < Xj ^ l/j} (12) 

is straightforward to sample directly and the n-simplex 

S n = {(X1,X 2 , • • • ,X n+ i) G M n+1 | < Xji^^Xj = 1} (13) 

3 

is only slightly more involved 2 . We expect the simplex of 
a real lens system to be similar to H n since it has 2n in- 
equalities and 2 n vertices. In addition, we also test S n as an 
extreme example. We repeated the tests of PixeLens from 
Section 3 using the hypercube and S n with N s — n 2 and 
N s = n 2-5 , respectively, and show the results in Figure 5. 
We are able to match the expected distributions of a ran- 
dom sample perfectly. 

We test the global properties of our samples Xh , Xs 
against the random samples Rh , Rs by comparing the eigen- 
values of the respective sample covariance matrices. Each 
sample set contains 1000 points. In Figure 6 we show the 
sorted eigenvalues for these samples. As expected from 
Eq. (11) the hyper rectangle converges at N s ~ n 2 . The n- 
simplex requires a larger N s , as mentioned in Section 4, 
because the estimate of the diameter from Eq. (11) is no 
longer valid. In the right panels, we have taken several hun- 
dred random sample sets and plotted the la deviations. 

2 To generate a uniform random point in the regular n-simplex 
S n choose an (n + l)-vector V of i.i.d. numbers drawn from an 
exponential distribution. Then V' = V/ Vj is a point in S n 
(Devroye 1986). 



The plots are normalized to the mean of these random 
sets. The volume of a simplex can be well approximated 
by (det$]) 1//2 = Ylj Table 1 shows the volumes of all 

the computed samples. The uncertainties have been calcu- 
lated from 1000 independent runs. Our sampling strategy is 
in excellent agreement with random samples. 

While the eigenvalues paint a global picture, we also 
tested the local properties of our sample. In particular, we 
looked at the distribution of nearest neighbor distances. In 
Figure 7 we show this distribution for H n and S n and for 
increasing N s . A misalignment with a random sample or 
multiple peaks are indicators of a correlated sample, such as 
clumped points. The chains with too low N s are unable to 
traverse across the simplex resulting in sample points which 
are too close to each other. By the time the eigenvalues 
converge the local distribution also converges. 

Finally, we tested our implementation with a simulated 
triaxial lens mass. The four image positions and respective 
time delays were calculated using a root finding algorithm 
built into GLASS. We supplied the value of ifo, all the time 
delays, and all the image positions to the algorithm without 
error bars to test the effectiveness of the method. The prob- 
lem, however, still remains heavily underconstrained. In the 
near future, we will explore in detail the effects of relaxing 
these assumptions in a variety of different lens systems. 

For the reconstruction, we used a grid of 225 pixels but 
we assumed radial symmetry to reduce the number of inde- 
pendent pixels to 113. Together with the unknown source 
position the problem lies on a 104 dimensional affine space. 
The reconstructed average arrival time surface and images 
are shown in the left panel of Figure 8. The right panel com- 
pares the inferred surface density profile with uncertainties 
(black error bars) to the original lens profile. The constraint 
information on the mass profile is contained within the im- 
age positions (vertical lines) and therefore the pixels outside 
are not expected to be well fit. In general, though, this re- 
construction is excellent where the information content is 
highest. As discussed in Section 1, by sampling the solution 
space, we are able to explore the degeneracies which sim- 
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Figure 6. A convergence test of the MCMC sample eigenvalues Xj with increasing N s . For comparison, several hundred random sets 
were generated. The average of the random sets is shown in the left plots. The right plots normalize the MCMC eigenvalues to the mean 
of the random values. The grey band shows the la deviations of the random sets. The hyperrectangle (top) converges by N s ~ n 2 as 
predicted from Eq. (11), but the n-Simplex (bottom) with its different shape requires a larger N s ~ n 2-5 . 
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Figure 7. The probability that the nearest neighbor of a sample point is at a given distance in the hyperrectangle (left) and n-simplex 
(right). The histograms produced by GLASS show similar convergence as in Figure 6. 
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Figure 8. The reconstructed lens model of a source producing four images. (Left) The arrival time surface of the lens. The images 
(dots) are at the stationary points of the surface. (Right) The radial density profile of the original lens mass (grey) compared to the 
ensemble of possible models produced from the MCMC sampling (black error bars). The boxes (red) show the results obtained from 
PixeLens for the same lens where the error estimates clearly favor shallower models suggesting that the old algorithm over-samples some 
regions of the parameter space. The radial image positions are marked by vertical lines. 





0.0 0.5 1.0 1.5 2.0 2.5 

Nearest Neighbor Distance 



Figure 9. A convergence test of the sample eigenvalues (left) and the probability of nearest neighbor distances (right) using the 
realistic lens model depicted in Figure 8. The number of steps N s used in the MCMC walk is increased until the histograms of nearest 
neighbor distances converge. An accept/reject random sample is impossible to obtain given the unknown shape of the simplex. The left 
plot has been normalized to the N s = n 2-5 case. For N s > n 2 the eigenvalues are stable. 
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Table 1. Estimated volumes for the tested simplices of H n , S n , and the gravitational lens for various 
values of N s , demonstrating convergence. The volumes are calculated from the eigenvalues shown in 
Figure 6 and Figure 9. The column labeled random is calculated via direct random sampling, which 
is not possible for the lens. Where the value is red, the volume is underestimated compared with the 
random sample. 



n 1 ' 5 n 2 n 2,5 n 3 Random 

H n 0.244 ±0.044 9.1 ± 1.5 8.9 ± 1.4 9.0 ± 1.4 xlO - 

S n 0.094 ±0.043 1.64 ±0.51 1.62 ± 0.46 1.63 ± 0.52 xl0~ 

Lens (9.5 ±8.5) x 10~ 8 0.94 ±0.19 1.21 ± 0.24 — xlO - ' 



pie models cannot. For comparison, we also show the results 
for the same lens obtained using PixeLens (red boxes). Al- 
though the results are similar, the PixeLens error estimates 
favor shallower or nearly flat models, again suggesting that 
the old algorithm over-samples some regions of the parame- 
ter space as shown in the upper-right panel of Figure 2. 

We also performed the same eigenvalue and nearest 
neighbor analysis as before but because we are unable to 
directly sample the solution space we can only change N s 
until we converge. As expected and shown in Figure 9 we 
converge when N s ~ n 2 . 



7 OUTLOOK 

Our novel proposal strategy for the Metropolis-Hastings al- 
gorithm allows sampling of general convex poly topes in 100 
or more dimensions. We have implemented an efficient par- 
allel version of the algorithm in the gravitational lens model- 
ing framework GLASS so that we may apply the strategy to 
large lensing problems exceeding 1000 dimensions. GLASS 
will be publicly available for download in the near future. 

Several future applications are possible. Multiple red- 
shift sources carry information of the cosmological distances, 
which in turn depend on the cosmological parameters. Con- 
sidering the statistical dispersion of the parameter space, 
one could in principle be able to infer the cosmological pa- 
rameters in a Bayesian framework. In order to achieve this, 
a uniform sample of the solution space is needed. 

Previous work on estimating the Hubble Time has used 
systems of up to eighteen lenses (Paraficz & Hjorth 2010). 
New lenses can be included to further constrain this value 
but each additional lens increases the dimensionality of the 
space by ~ 100 making this current work essential for such 
upcoming projects. 
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